Search CORE

140 research outputs found

Generating Diverse and Meaningful Captions

Author: A Karpathy
I Goodfellow
O Russakovsky
O Vinyals
P Anderson
R Bernardi
S Hochreiter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.Comment: Accepted for presentation at The 27th International Conference on Artificial Neural Networks (ICANN 2018

arXiv.org e-Print Archive

Crossref

Arrow@TUDublin

Recurrent Fusion Network for Image Captioning

Author: H Xu
K He
O Vinyals
P Anderson
RJ Williams
S Hochreiter
T-Y Lin
ZH Zhou
Publication venue
Publication date: 30/07/2018
Field of study

Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then translated into natural language with a recurrent neural network (RNN). The existing models counting on this framework merely employ one kind of CNNs, e.g., ResNet or Inception-X, which describe image contents from only one specific view point. Thus, the semantic meaning of an input image cannot be comprehensively understood, which restricts the performance of captioning. In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning. The fusion process in our model can exploit the interactions among the outputs of the image encoders and then generate new compact yet informative representations for the decoder. Experiments on the MSCOCO dataset demonstrate the effectiveness of our proposed RFNet, which sets a new state-of-the-art for image captioning.Comment: ECCV-1

arXiv.org e-Print Archive

Crossref

Neural Networks for Information Retrieval

Author: Bahdanau D.
Bordes A.
Goodfellow I.
Hermann K. M.
Hu B.
Kingma D.
Krizhevsky A.
Kusner M. J.
Lin Y.
Lu Z.
Mikolov T.
Robertson S. E.
Srivastava N.
Sutskever I.
Vinyals O.
Weston J.
Publication venue
Publication date: 01/01/2017
Field of study

Machine learning plays a role in many aspects of modern IR systems, and deep learning is applied in all of them. The fast pace of modern-day research has given rise to many different approaches for many different IR problems. The amount of information available can be overwhelming both for junior students and for experienced researchers looking for new research topics and directions. Additionally, it is interesting to see what key insights into IR problems the new technologies are able to give us. The aim of this full-day tutorial is to give a clear overview of current tried-and-trusted neural methods in IR and how they benefit IR research. It covers key architectures, as well as the most promising future directions.Comment: Overview of full-day tutorial at SIGIR 201

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Guest Editorial: Non-Euclidean Machine Learning

Author: Bronstein M.
Bruna J.
Cohen T.
Gori M.
Leskovec J.
Lio P.
Song L.
Vinyals O.
Zafeiriou S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Over the past decade, deep learning has had a revolutionary impact on a broad range of fields such as computer vision and image processing, computational photography, medical imaging and speech and language analysis and synthesis etc. Deep learning technologies are estimated to have added billions in business value, created new markets, and transformed entire industrial segments. Most of today’s successful deep learning methods such as Convolutional Neural Networks (CNNs) rely on classical signal processing models that limit their applicability to data with underlying Euclidean grid-like structure, e.g., images or acoustic signals. Yet, many applications deal with non-Euclidean (graph- or manifold-structured) data. For example, in social network analysis the users and their attributes are generally modeled as signals on the vertices of graphs. In biology protein-to-protein interactions are modeled as graphs. In computer vision & graphics 3D objects are modeled as meshes or point clouds. Furthermore, a graph representation is a very natural way to describe interactions between objects or signals. The classical deep learning paradigm on Euclidean domains falls short in providing appropriate tools for such kind of data. Until recently, the lack of deep learning models capable of correctly dealing with non-Euclidean data has been a major obstacle in these fields. This special section addresses the need to bring together leading efforts in non-Euclidean deep learning across all communities. From the papers that the special received twelve were selected for publication. The selected papers can naturally fall in three distinct categories: (a) methodologies that advance machine learning on data that are represented as graphs, (b) methodologies that advance machine learning on manifold-valued data, and (c) applications of machine learning methodologies on non-Euclidean spaces in computer vision and medical imaging. We briefly review the accepted papers in each of the groups

Archivio della Ricerca - Università degli Studi di Siena

Web Search of Fashion Items with Multimodal Querying

Author: Donahue J.
Han X.
Hsiao J.-H.
Karpathy A.
Laenen K.
Mikolov T.
Vinyals O.
Xu K.
Zhao B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the query text allows to change certain product attributes to fit the product to the user’s desire. Multimodal search gives users the means to clearly express what they are looking for. This is in contrast to current e-commerce search mechanisms, which are cumbersome and often fail to grasp the customer’s needs. Multimodal search requires intermodal representations of visual and textual fashion attributes which can be mixed and matched to form the user’s desired product, and which have a mechanism to indicate when a visual and textual fashion attribute represent the same concept. With a neural network, we induce a common, multimodal space for visual and textual fashion attributes where their inner product measures their semantic similarity. We build a multimodal retrieval model which operates on the obtained intermodal representations and which ranks images based on their relevance to a multimodal query. We demonstrate that our model is able to retrieve images that both exhibit the necessary query image attributes and satisfy the query texts. Moreover, we show that our model substantially outperforms two state-of-the-art retrieval models adapted to multimodal fashion search.status: accepte

Lirias

Crossref

Dynamic Key-Value Memory Networks for Knowledge Tracing

Author: Chen T.
Corbett A. T.
Grefenstette E.
Joulin A.
Khajah M.
Krizhevsky A.
Maaten L. v. d.
Mikolov T.
Orr G. B.
Pardos Z. A.
Pascanu R.
Piech C.
Santoro A.
Sukhbaatar S.
Vinyals O.
Weston J.
Wilson K. H.
Xiong X.
Yudelson M. V.
Publication venue
Publication date: 17/02/2017
Field of study

Knowledge Tracing (KT) is a task of tracing evolving knowledge state of students with respect to one or more concepts as they engage in a sequence of learning activities. One important purpose of KT is to personalize the practice sequence to help students learn knowledge concepts efficiently. However, existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing either model knowledge state for each predefined concept separately or fail to pinpoint exactly which concepts a student is good at or unfamiliar with. To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student's mastery level of each concept. Unlike standard memory-augmented neural networks that facilitate a single memory matrix or two static memory matrices, our model has one static matrix called key, which stores the knowledge concepts and the other dynamic matrix called value, which stores and updates the mastery levels of corresponding concepts. Experiments show that our model consistently outperforms the state-of-the-art model in a range of KT datasets. Moreover, the DKVMN model can automatically discover underlying concepts of exercises typically performed by human annotations and depict the changing knowledge state of a student.Comment: To appear in 26th International Conference on World Wide Web (WWW), 201

arXiv.org e-Print Archive

Crossref

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

Author: Dahl George E.
Faber Felix A.
Gilmer Justin
Huang Bing
Hutchison Luke
Kearnes Steven
Riley Patrick F.
Schoenholz Samuel S.
Vinyals Oriol
von Lilienfeld O. Anatole
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/01/2017
Field of study

We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∼118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [Ramakrishnan et al. Sci. Data 2014, 1, 140022] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR), and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). Out-of sample errors are strongly dependent on the choice of representation and regressor and molecular property. Electronic properties are typically best accounted for by MG and GC, while energetic properties are better described by HDAD and KRR. The specific combinations with the lowest out-of-sample errors in the ∼118k training set size limit are (free) energies and enthalpies of atomization (HDAD/KRR), HOMO/LUMO eigenvalue and gap (MG/GC), dipole moment (MG/GC), static polarizability (MG/GG), zero point vibrational energy (HDAD/KRR), heat capacity at room temperature (HDAD/KRR), and highest fundamental vibrational frequency (BAML/RF). We present numerical evidence that ML model predictions deviate from DFT (B3LYP) less than DFT (B3LYP) deviates from experiment for all properties. Furthermore, out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. The results suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available

arXiv.org e-Print Archive

edoc

FigShare

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

Author: A Hussein
D Silver
H Idrees
H Lu
H Xiong
IH Laradji
L Liu
L Van Hove
M Riedmiller
O Vinyals
R Guerrero-Gómez-Olmedo
T Stahl
V Mnih
Publication venue
Publication date: 01/01/2020
Field of study

We formulate counting as a sequential decision problem and present a novel crowd counting model solvable by deep reinforcement learning. In contrast to existing counting models that directly output count values, we divide one-step estimation into a sequence of much easier and more tractable sub-decision problems. Such sequential decision nature corresponds exactly to a physical process in reality scale weighing. Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight. By virtually placing a crowd image on one side of a scale, LibraNet (agent) sequentially learns to place appropriate weights on the other side to match the crowd count. At each step, LibraNet chooses one weight (action) from the weight box (the pre-defined action pool) according to the current crowd image features and weights placed on the scale pan (state). LibraNet is required to learn to balance the scale according to the feedback of the needle (Q values). We show that LibraNet exactly implements scale weighing by visualizing the decision process how LibraNet chooses actions. Extensive experiments demonstrate the effectiveness of our design choices and report state-of-the-art results on a few crowd counting benchmarks. We also demonstrate good cross-dataset generalization of LibraNet. Code and models are made available at: https://git.io/libranetComment: Accepted to Proc. Eur. Conf. Computer Vision (ECCV) 202

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Pareto multi-task deep learning

Author: D Dyankov
D Silver
DE Rumelhart
G Stracquadanio
K De Jong
K Stanley
O Vinyals
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Neuroevolution has been used to train Deep Neural Networks on reinforcement learning problems. A few attempts have been made to extend it to address either multi-task or multi-objective optimization problems. This research work presents the Multi-Task Multi-Objective Deep Neuroevolution method, a highly parallelizable algorithm that can be adopted for tackling both multi-task and multi-objective problems. In this method prior knowledge on the tasks is used to explicitly define multiple utility functions, which are optimized simultaneously. Experimental results on some Atari 2600 games, a challenging testbed for deep reinforcement learning algorithms, show that a single neural network with a single set of parameters can outperform previous state of the art techniques. In addition to the standard analysis, all results are also evaluated using the Hypervolume indicator and the Kullback-Leibler divergence to get better insights on the underlying training dynamics. The experimental results show that a neural network trained with the proposed evolution strategy can outperform networks individually trained respectively on each of the tasks

Central Archive at the University of Reading

Crossref

Warm-Start AlphaZero Self-Play Search Enhancements

Author: C Browne
CD Rosin
D Silver
D Silver
D Silver
EA Heinz
G Tesauro
H Wang
J Schmidhuber
J Tao
LV Allis
M Buro
MA Wiering
ML Zhang
N Justesen
N Srivastava
O Vinyals
R Coulom
R Coulom
RD Gaina
S Gelly
S Iwata
S Reisch
SY Chong
TP Runarsson
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2020
Field of study

Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways to improve results while reducing computational demands. AlphaZero's design is purely based on self-play and makes no use of labeled expert data ordomain specific enhancements; it is designed to learn from scratch. We propose a novel approach to deal with this cold-start problem by employing simple search enhancements at the beginning phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE) and dynamically weighted combinations of these with the neural network, and Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that most of these enhancements improve the performance of their baseline player in three different (small) board games, with especially RAVE based variants playing strongly

arXiv.org e-Print Archive

Crossref

Leiden University Scholary Publications